Linear-Time Computation of Prefix Table for Weighted Strings
نویسندگان
چکیده
The prefix table of a string is one of the most fundamental data structures of algorithms on strings: it determines the longest factor at each position of the string that matches a prefix of the string. It can be computed in time linear with respect to the size of the string, and hence it can be used efficiently for locating patterns or for regularity searching in strings. A weighted string is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain strings, naturally arise in many biological contexts; for example, they provide a method to realise approximation among occurrences of the same DNA segment. In this article, given a weighted string x of length n and a constant cumulative weight threshold 1/z, defined as the minimal probability of occurrence of factors in x, we present an O(n)-time algorithm for computing the prefix table of x. Furthermore, we outline a number of applications of this result for solving various problems on non-standard strings, and present some preliminary experimental results.
منابع مشابه
Enhanced Covers of Regular and Indeterminate Strings Using Prefix Tables
A cover of a string x = x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper [12] relaxes this definition — an enhanced cover u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a maximum number of positions in x (not necessarily all) — and proposes efficient algorithms for the computation o...
متن کاملEnhanced Covers of Regular & Indeterminate Strings using Prefix Tables
A cover of a string x = x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper [12] relaxes this definition — an enhanced cover u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a maximum number of positions in x (not necessarily all) — and proposes efficient algorithms for the computation o...
متن کاملComputing Covers Using Prefix Tables
An indeterminate string x = x[1..n] on an alphabet Σ is a sequence of nonempty subsets of Σ; x is said to be regular if every subset is of size one. A proper substring u of regular x is said to be a cover of x iff for every i ∈ 1..n, an occurrence of u in x includes x[i]. The cover array γ = γ[1..n] of x is an integer array such that γ[i] is the longest cover of x[1..i]. Fifteen years ago a com...
متن کاملTime Complexity of Knuth-Morris-Pratt String Matching Algorithm
This project centers on the evaluation for the time complexity of Knuth-Morris-Pratt(KMP) string matching algorithm. String matching problem is to locate a pattern string within a larger string. The best performance in terms of asymptotic time complexity is currently linear, given by the KMP algorithm. In this algorithm, firstly a prefix for the pattern string is computed and then based on this...
متن کاملImproved Filters for the Approximate Suffix-Prefix Overlap Problem
Computing suffix-prefix overlaps for a large collection of strings is a fundamental building block for the analysis of genomic next-generation sequencing data. The approximate suffix-prefix overlap problem is to find all pairs of strings from a given set such that a prefix of one string is similar to a suffix of the other. Välimäki et al. (Information and Computation, 2012) gave a solution to t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 656 شماره
صفحات -
تاریخ انتشار 2015